智能论文笔记

Online pseudo labeling for polyp segmentation with momentum networks

Toan Pham Van , Linh Bao Doan , Thanh Tung Nguyen , Duc Trung Tran , Quan Van Nguyen , Dinh Viet Sang

分类：计算机视觉

2022-09-29

语义分割是开发医学图像诊断系统的重要任务。但是，构建注释的医疗数据集很昂贵。因此，在这种情况下，半监督方法很重要。在半监督学习中，标签的质量在模型性能中起着至关重要的作用。在这项工作中，我们提出了一种新的伪标签策略，可提高用于培训学生网络的伪标签的质量。我们遵循多阶段的半监督训练方法，该方法在标记的数据集上训练教师模型，然后使用训练有素的老师将伪标签渲染用于学生培训。通过这样做，伪标签将被更新，并且随着培训的进度更加精确。上一个和我们的方法之间的关键区别在于，我们在学生培训过程中更新教师模型。因此，在学生培训过程中，提高了伪标签的质量。我们还提出了一种简单但有效的策略，以使用动量模型来提高伪标签的质量 - 训练过程中原始模型的慢复制版本。通过应用动量模型与学生培训期间的重新渲染伪标签相结合，我们在五个数据集中平均达到了84.1％的骰子分数（即Kvarsir，CVC-ClinicdB，Etis-laribpolypdb，cvc-colondb，cvc-colondb，cvc-colondb和cvc-300）和CVC-300）只有20％的数据集用作标记数据。我们的结果超过了3％的共同实践，甚至在某些数据集中取得了完全监督的结果。我们的源代码和预培训模型可在https://github.com/sun-asterisk-research/online学习SSL上找到

translated by 谷歌翻译

Multiple Perturbation Attack: Attack Pixelwise Under Different $\ell_p$-norms For Better Adversarial Performance

Ngoc N. Tran , Anh Tuan Bui , Dinh Phung , Trung Le

分类：计算机视觉 | 机器学习

2022-12-05

Adversarial machine learning has been both a major concern and a hot topic recently, especially with the ubiquitous use of deep neural networks in the current landscape. Adversarial attacks and defenses are usually likened to a cat-and-mouse game in which defenders and attackers evolve over the time. On one hand, the goal is to develop strong and robust deep networks that are resistant to malicious actors. On the other hand, in order to achieve that, we need to devise even stronger adversarial attacks to challenge these defense models. Most of existing attacks employs a single $\ell_p$ distance (commonly, $p\in\{1,2,\infty\}$) to define the concept of closeness and performs steepest gradient ascent w.r.t. this $p$-norm to update all pixels in an adversarial example in the same way. These $\ell_p$ attacks each has its own pros and cons; and there is no single attack that can successfully break through defense models that are robust against multiple $\ell_p$ norms simultaneously. Motivated by these observations, we come up with a natural approach: combining various $\ell_p$ gradient projections on a pixel level to achieve a joint adversarial perturbation. Specifically, we learn how to perturb each pixel to maximize the attack performance, while maintaining the overall visual imperceptibility of adversarial examples. Finally, through various experiments with standardized benchmarks, we show that our method outperforms most current strong attacks across state-of-the-art defense mechanisms, while retaining its ability to remain clean visually.

translated by 谷歌翻译

Continual Learning with Optimal Transport based Mixture Model

Quyen Tran , Hoang Phan , Khoat Than , Dinh Phung , Trung Le

分类：机器学习 | 计算机视觉

2022-11-30

Online Class Incremental learning (CIL) is a challenging setting in Continual Learning (CL), wherein data of new tasks arrive in incoming streams and online learning models need to handle incoming data streams without revisiting previous ones. Existing works used a single centroid adapted with incoming data streams to characterize a class. This approach possibly exposes limitations when the incoming data stream of a class is naturally multimodal. To address this issue, in this work, we first propose an online mixture model learning approach based on nice properties of the mature optimal transport theory (OT-MM). Specifically, the centroids and covariance matrices of the mixture model are adapted incrementally according to incoming data streams. The advantages are two-fold: (i) we can characterize more accurately complex data streams and (ii) by using centroids for each class produced by OT-MM, we can estimate the similarity of an unseen example to each class more reasonably when doing inference. Moreover, to combat the catastrophic forgetting in the CIL scenario, we further propose Dynamic Preservation. Particularly, after performing the dynamic preservation technique across data streams, the latent representations of the classes in the old and new tasks become more condensed themselves and more separate from each other. Together with a contraction feature extractor, this technique facilitates the model in mitigating the catastrophic forgetting. The experimental results on real-world datasets show that our proposed method can significantly outperform the current state-of-the-art baselines.

translated by 谷歌翻译

LG-Hand: Advancing 3D Hand Pose Estimation with Locally and Globally Kinematic Knowledge

Tu Le-Xuan , Trung Tran-Quang , Thi Ngoc Hien Doan , Thanh-Hai Tran

分类：计算机视觉

2022-11-06

3D hand pose estimation from RGB images suffers from the difficulty of obtaining the depth information. Therefore, a great deal of attention has been spent on estimating 3D hand pose from 2D hand joints. In this paper, we leverage the advantage of spatial-temporal Graph Convolutional Neural Networks and propose LG-Hand, a powerful method for 3D hand pose estimation. Our method incorporates both spatial and temporal dependencies into a single process. We argue that kinematic information plays an important role, contributing to the performance of 3D hand pose estimation. We thereby introduce two new objective functions, Angle and Direction loss, to take the hand structure into account. While Angle loss covers locally kinematic information, Direction loss handles globally kinematic one. Our LG-Hand achieves promising results on the First-Person Hand Action Benchmark (FPHAB) dataset. We also perform an ablation study to show the efficacy of the two proposed objective functions.

translated by 谷歌翻译

FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for Non-IID Data in Federated Learning

Nang Hung Nguyen , Phi Le Nguyen , Duc Long Nguyen , Trung Thanh Nguyen , Thuy Dung Nguyen , Huy Hieu Pham , Truong Thao Nguyen

分类：机器学习 | 计算机视觉

2022-08-04

跨不同边缘设备（客户）局部数据的分布不均匀，导致模型训练缓慢，并降低了联合学习的准确性。幼稚的联合学习（FL）策略和大多数替代解决方案试图通过加权跨客户的深度学习模型来实现更多公平。这项工作介绍了在现实世界数据集中遇到的一种新颖的非IID类型，即集群键，其中客户组具有具有相似分布的本地数据，从而导致全局模型收敛到过度拟合的解决方案。为了处理非IID数据，尤其是群集串数据的数据，我们提出了FedDrl，这是一种新型的FL模型，它采用了深厚的强化学习来适应每个客户的影响因素（将用作聚合过程中的权重）。在一组联合数据集上进行了广泛的实验证实，拟议的FEDDR可以根据CIFAR-100数据集的平均平均为FedAvg和FedProx方法提高了有利的改进，例如，高达4.05％和2.17％。

translated by 谷歌翻译

An Additive Instance-Wise Approach to Multi-class Model Interpretation

Vy Vo , Van Nguyen , Trung Le , Quan Hung Tran , Gholamreza Haffari , Seyit Camtepe , Dinh Phung

分类：机器学习 | 人工智能

2022-07-07

可解释的机器学习提供了有关哪些因素推动了黑盒系统的一定预测以及是否信任高风险决策或大规模部署的洞察力。现有方法主要集中于选择解释性输入功能，这些功能遵循本地添加剂或实例方法。加性模型使用启发式采样扰动来依次学习实例特定解释器。因此，该过程效率低下，并且容易受到条件较差的样品的影响。同时，实例技术直接学习本地采样分布，并可以从其他输入中利用全球信息。但是，由于严格依赖预定义的功能，他们只能解释单一级预测并在不同设置上遇到不一致的情况。这项工作利用了这两种方法的优势，并提出了一个全球框架，用于同时学习多个目标类别的本地解释。我们还提出了一种自适应推理策略，以确定特定实例的最佳功能数量。我们的模型解释器极大地超过了忠诚的添加和实例的对应物，而在各种数据集和Black-box模型体系结构上获得了高水平的简洁性。

translated by 谷歌翻译

An FPGA-based Solution for Convolution Operation Acceleration

Trung Dinh Pham , Bao Gia Bach , Lam Trinh Luu , Minh Dinh Nguyen , Hai Duc Pham , Khoa Bui Anh , Xuan Quang Nguyen , Cuong Pham Quoc

分类：人工智能 | 机器学习

2022-06-09

基于硬件的加速度是促进许多计算密集型数学操作的广泛尝试。本文提出了一个基于FPGA的体系结构来加速卷积操作 - 在许多卷积神经网络模型中出现的复杂且昂贵的计算步骤。我们将设计定为标准卷积操作，打算以边缘-AI解决方案启动产品。该项目的目的是产生一个可以一次处理卷积层的FPGA IP核心。系统开发人员可以使用Verilog HDL作为体系结构的主要设计语言来部署IP核心。实验结果表明，我们在简单的边缘计算FPGA板上合成的单个计算核心可以提供0.224 GOPS。当董事会充分利用时，可以实现4.48 GOP。

translated by 谷歌翻译

Novel projection schemes for graph-based Light Field coding

Bach Gia Nguyen , Chanh Minh Tran , Tho Nguyen Duc , Tan Xuan Phan , Kamioka Eiji

分类：计算机视觉

2022-06-09

在光场压缩中，基于图的编码功能强大，可以利用沿着不规则形状的信号冗余并获得良好的能量压实。然而，除了高度复杂性到处理高维图外，它们的图形构造方法对观点之间的差异信息的准确性非常敏感。在计算机软件生成的现实世界光场或合成光场中，由于渐晕效果和两种类型的光场视图之间的视图之间的巨大差异，将视差信息用于超射线投影可能会遭受不准确性。本文介绍了两种新型投影方案，导致差异信息的错误较小，其中一个投影方案还可以显着降低编码器和解码器的时间计算。实验结果表明，与原始投影方案和基于HEVC或基于JPEG PLENO的编码方法相比，使用这些建议可以大大增强超级像素的投影质量，以及率延伸性能。

translated by 谷歌翻译

Stochastic Multiple Target Sampling Gradient Descent

Hoang Phan , Ngoc Tran , Trung Le , Toan Tran , Nhat Ho , Dinh Phung

分类：机器学习 | 人工智能 | (统计)机器学习

2022-06-04

从非规范目标分布中抽样是概率推断中许多应用的基本问题。 Stein变异梯度下降（SVGD）已被证明是一种强大的方法，它迭代地更新一组粒子以近似关注的分布。此外，在分析其渐近性特性时，SVGD会准确地减少到单目标优化问题，并可以看作是此单目标优化问题的概率版本。然后出现一个自然的问题：“我们可以得出多目标优化的概率版本吗？”。为了回答这个问题，我们提出了随机多重目标采样梯度下降（MT-SGD），从而使我们能够从多个非差异目标分布中采样。具体而言，我们的MT-SGD进行了中间分布的流动，逐渐取向多个目标分布，这使采样颗粒可以移动到目标分布的关节高样区域。有趣的是，渐近分析表明，正如预期的那样，我们的方法准确地减少了多级下降算法以进行多目标优化。最后，我们进行全面的实验，以证明我们进行多任务学习方法的优点。

translated by 谷歌翻译

ColonFormer: An Efficient Transformer based Method for Colon Polyp Segmentation

Nguyen Thanh Duc , Nguyen Thi Oanh , Nguyen Thi Thuy , Tran Minh Triet , Dinh Viet Sang

分类：计算机视觉

2022-05-17

识别息肉对于在计算机辅助临床支持系统中自动分析内窥镜图像的自动分析具有挑战性。已经提出了基于卷积网络（CNN），变压器及其组合的模型，以分割息肉以有希望的结果。但是，这些方法在模拟息肉的局部外观方面存在局限性，或者在解码过程中缺乏用于空间依赖性的多层次特征。本文提出了一个新颖的网络，即结肠形式，以解决这些局限性。 Colonformer是一种编码器架构，能够在编码器和解码器分支上对远程语义信息进行建模。编码器是一种基于变压器的轻量级体系结构，用于在多尺度上建模全局语义关系。解码器是一种层次结构结构，旨在学习多层功能以丰富特征表示。此外，添加了一个新的Skip连接技术，以完善整体地图中的息肉对象的边界以进行精确分割。已经在五个流行的基准数据集上进行了广泛的实验，以进行息肉分割，包括Kvasir，CVC-Clinic DB，CVC-ColondB，CVC-T和Etis-Larib。实验结果表明，我们的结肠构造者在所有基准数据集上的表现优于其他最先进的方法。

translated by 谷歌翻译